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Abstract.  Non-local  methods  for  image  denoising  and  inpainting  have 
gained  considerable  attention  in  recent  years.  This  is  in  part  due  to  their 
superior  performance  in  textured  images  and  regions,  a  known  weakness 
of  purely  local  methods.  Local  methods  on  the  other  hand  have  demon¬ 
strated  to  be  very  appropriate  for  the  recovering  of  geometric  structure 
such  as  image  edges.  The  synthesis  of  both  types  of  methods  is  a  trend 
in  current  research.  Variational  analysis  in  particular  is  an  appropriate 
tool  for  a  unified  treatment  of  local  and  non-local  methods.  In  this  work 
we  propose  a  general  variational  framework  for  the  problem  of  non-local 
image  inpainting,  from  which  several  previous  inpainting  schemes  can  be 
derived,  in  addition  to  leading  to  novel  ones.  We  explicitly  study  some  of 
these,  relating  them  to  previous  work  and  showing  results  on  synthetic 
and  real  images. 


1  Introduction 

Image  inpainting^  also  known  as  image  eompletion  or  disoeelusion^  is  an  active 
research  area  in  image  processing.  The  purpose  of  inpainting  is  to  obtain  a 
visually  plausible  image  interpolation  in  a  region  in  which  data  are  missing  due 
to  damage  or  occlusion.  Usually,  to  solve  this  problem,  the  only  available  data 
is  the  image  outside  of  the  region  to  be  inpainted.  In  addition  to  its  theoretical 
importance,  image  inpainting  is  a  very  important  problem  due  to  its  applications 
to  image  and  video  editing  and  restoration. 

Inpainting  methods  found  in  the  literature  can  be  classified  into  two  groups: 
geometry-  and  texture- oriented  methods. 

Geometry-oriented  methods.  Images  are  modeled  as  functions  with  some 
degree  of  smoothness  (expressed  for  instance  in  terms  of  the  curvature  of  the 
level  lines  or  the  total  variation  of  the  image) ,  and  the  interpolation  is  performed 
by  continuing  and  imposing  this  model  inside  the  inpainting  domain.  This  has 
been  performed  either  using  variational  techniques,  as  for  instance  in  [3, 11, 12, 
19,28,29],  or  with  PDEs  [4,7,36].  These  methods  show  a  good  performance  in 
propagating  smooth  level  lines  or  gradients.  However  they  fail  in  the  presence  of 
texture.  This  is  often  referred  to  as  strueture  or  eartoon  inpainting. 
Texture-oriented  methods.  Texture-oriented  inpainting  was  born  as  an  ap¬ 
plication  of  texture  synthesis,  e.g.,  [18,  21].  Its  recent  development  was  triggered 
in  part  by  [18, 37]  using  non-par ametric  sampling  techniques.  In  these  works  tex¬ 
ture  is  modeled  as  a  two  dimensional  probabilistic  graphieal  models  in  which  the 
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value  of  each  pixel  is  conditioned  by  its  neighborhood.  These  approaches  rely 
directly  on  a  sample  of  the  desired  texture  to  perform  the  synthesis. 

In  practice  these  methods  work  progressively  by  expanding  a  region  of  syn¬ 
thesized  texture.  The  value  for  a  target  pixel  x  is  copied  from  the  center  of  a 
square  patch  in  the  sample  image,  chosen  among  those  that  best  match  the  avail¬ 
able  portion  of  the  patch  centered  at  x.  Levina  and  Bickel  [26]  recently  provided 
a  probabilistic  theoretical  justification  for  this  strategy. 

This  method  (as  explained  above  or  with  some  modifications)  has  been  exten¬ 
sively  used  for  inpainting  [5,6,14,17,18,31].  As  opposed  to  geometry-oriented 
inpainting,  these  so-called  exemplar- based  approaches,  are  non-loeal:  To  deter¬ 
mine  the  value  at  x,  the  whole  image  may  be  scanned  in  the  search  of  a  matching 
patch. 

Since  these  texture  approaches  are  greedy  procedures  (each  hole  pixel  is  vis¬ 
ited  only  once),  the  results  are  very  sensitive  to  the  order  in  which  pixels  are 
processed  [14].  This  issue  was  addressed  in  [24, 38]  where  the  inpainting  problem 
is  stated  as  the  optimization  of  an  energy  derived  from  probabilistic  graphical 
models  (see  also  [25]). 

A  variational  justification  for  texture-based  methods  was  presented  in  [16], 
where  the  inpainting  problem  is  reformulated  as  that  of  finding  a  eorrespondenee 
map  r  :  O  ^  O*^,  O  being  the  inpainting  domain  and  its  complement  w.r.t. 
the  image  domain.  Denoting  the  image  by  the  inpainted  value  at  position 
X  G  O  is  then  given  hy  u{x)  =  u{r (x)) ,  F {')  being  the  correspondence  map.  The 
authors  proposed  a  continuous  energy  functional  in  which  the  unknown  is  the 
correspondence  map  itself: 

E{r)  =  [  [  {u{r{x  -  y))  -  u{r{x)  -  y)fdydx, 

Jo  JOp 

where  Qp  is  the  patch  domain  (centered  at  (0,  0)).  Thus  F  should  map  a  pixel  x 
and  its  neighbors  in  such  a  way  that  the  resulting  patch  is  close  to  the  one  cen¬ 
tered  at  F{x).  This  model  has  been  the  subject  of  further  (theoretical)  analysis 
by  Aujol  et  al.[l]. 

A  different  variational  model  was  presented  in  [32] .  Images  are  modeled  as  en¬ 
sembles  of  patches  on  a  given  patch  manifold.  For  inpainting,  the  patch  manifold 
can  be  learned  from  the  set  of  patches  in  the  hole’s  complement.  The  method  is 
iterative,  with  each  iteration  having  two  steps.  First,  the  patches  in  the  hole  are 
projected  onto  the  manifold.  Since  this  is  done  for  each  patch  independently,  the 
projected  patches  are  not  necessarily  coherent  with  each  other,  be.  over  lapping 
patches  may  have  different  values  in  the  overlap  region.  Therefore,  in  the  second 
step,  an  image  is  computed  by  averaging  the  patches  in  the  ensemble. 

Exemplar-based  methods  provide  impressive  results  in  recovering  textures 
and  repetitive  structures.  However,  their  ability  to  recreate  the  geometry  without 
any  example  is  limited  and  not  well  understood.  Therefore,  different  strategies 
have  been  proposed  which  combine  geometry  and  texture  inpainting  [5, 10, 17, 
23] .  These  methods  usually  decompose  the  image  in  some  sort  of  structure  and 
texture  components.  Structure  is  reconstructed  using  some  geometry-oriented 
scheme,  and  this  is  used  to  guide  the  texture  inpainting. 

Contributions  of  this  work.  Despite  these  combined  methods,  geometry  and 
texture  inpainting  are  still  quite  separate  fields,  each  one  with  its  own  analy- 
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sis  and  implementation  tools.  Variational  models  as  the  one  introduced  in  this 
paper  can  provide  common  tools  allowing  a  unified  treatment  of  both  trends. 
We  therefore  propose  a  variational  framework  for  non-local  image  inpainting  as 
a  contribution  to  the  modeling  and  analysis  of  texture-oriented  methods.  Our 
formulation  is  rather  general  and  different  inpainting  schemes  can  be  derived 
naturally  from  it,  via  the  selection  of  the  appropriate  patch  metric. 

In  the  present  work  we  study  three  of  them,  patch  NL-means,  -medians^  and 
-Poisson.  The  former  is  related  to  the  method  of  [38]  and  can  be  interpreted 
in  terms  of  the  mean  shift  [13]  and  the  manifold  models  of  [32].  The  other 
schemes  are,  to  the  best  of  our  knowledge  novel.  The  latter  imposes  coherence 
of  the  gradients,  in  addition  to  that  of  the  gray  levels,  which  implies  a  smoother 
continuation  of  the  information  across  the  boundary  and  inside  the  inpainting 
domain,  thus  acting  as  a  basic  local  regularization. 

Our  work  is  related  to  recent  variational  formulations  of  non-local  denois- 
ing  ([2,9])  by  Gilboa  and  Osher  [20].  The  image  redundancy  and  self-similarity 
(measured  as  patch  similarity)  is  encoded  by  a  non-local  weight  function  w  : 
O  X  ^  M.  This  function  serves  as  a  fuzzy  correspondence,  and  differs  from 
the  works  [1, 16],  although  a  (eventually  multivalued)  correspondence  map  can 
be  approximated  as  a  limit  of  our  model. 

Notation.  Images  are  denoted  as  functions  :  i?  ^  M,  where  i?  denotes  the 
image  domain,  usually  a  rectangle  in  Pixel  positions  are  denoted  by  x,  x', 
z'  OT  y,  the  latter  for  positions  inside  the  patch.  A  patch  of  u  centered  at  x  is 
denoted  by  Pu{x)  =  Pu{x^-)  :  Qp  ^  M,  where  Qp  is  a  rectangle  centered  at  (0,  0). 
The  patch  is  defined  by  Pu{x^y)  =  u{x  y),  with  y  E  f2p.  O  C  f2  is  the  hole  or 
inpainting  domain,  and  =  f2\0.  We  still  denote  by  u  the  part  of  the  image 
u  inside  the  hole,  while  u  is  the  part  oi  u  in  \  u  =  u\o<^ .  Additional  notation 
will  be  introduced  in  the  text. 

2  Variational  framework 

Our  variational  framework  is  inspired  by  the  following  non-local  functional 

Fw{u)  =  /  /  w{x^x'){u{x)  —  u{x'))‘^dx' dx.  (1) 

Jo  Jo^ 

The  weight  function  w  :  O  x  ^  measures  the  similarity  between  patches 
centered  in  the  inpainting  domain  and  in  its  complement.  Gaussian  weights  are 
commonly  used,  given  by  w{x^x')  =  exp  {—j^\\Pu{x)  —  Pu{x')\\‘^)  ,  where  ||  •  ||  is  a 
weighted  L2  norm  in  the  space  of  patches  and  h  is  the  scale.  A  similar  functional 
was  proposed  in  [20]  as  a  non-local  regularization  energy  in  the  context  of  image 
denoising  which  models  the  non-local  means  filter  [2,9]  (see  [35]  for  a  different 
model  of  non-local  means).  An  extension  to  super-resolution  is  presented  in  [34]. 

In  [20]  the  weights  are  considered  known  and  remain  fixed  through  all  the 
iterations.  While  this  might  be  appropriate  in  applications  where  the  weights 
can  be  estimated  from  the  noisy  image,  in  the  image  inpainting  scenario  here 
addressed,  weights  are  not  available  and  have  to  be  inferred  together  with  the 
image  ([?,33]).  One  of  the  novelties  of  the  proposed  framework  is  the  inclusion 
of  adaptive  weights  in  a  variational  setting. 


4 


For  this  reason,  we  will  consider  the  weight  function  w  as  an  additional 
unkown.  Instead  of  prescribing  explicitly  the  Gaussian  functional  dependence  of 
re  w.r.t.  14,  we  will  do  it  implicitly,  as  a  component  of  the  optimization  process.  In 
doing  so,  we  obtain  a  simpler  functional,  avoiding  to  deal  with  the  complex,  non¬ 
linear  dependence  between  w  and  u.  In  our  formulation,  re(x,  •)  is  a  probability 
density  function,  w{x,  x')dx'  =  1,  and  can  be  seen  as  a  relaxation  of  the  one- 
to-one  correspondence  map  of  [1, 16],  providing  a  fuzzy  correspondence  between 
each  X  G  O  and  the  complement  of  the  inpainting  domain. 

In  this  setting,  we  propose  an  energy  which  contains  two  terms,  one  of  them 
is  inspired  by  (1)  and  measures  the  coherence  between  the  pixels  in  O  and  those 
in  for  a  given  similarity  weight  function  w.  This  permits  the  estimation  of 
the  image  u  from  the  weights  w.  The  second  term  allows  us  to  compute  the 
weights  given  the  image.  The  complete  proposed  functional  is 


E{u,  w) 

(2) 

where 

Fn,{u)  =  f  i 

Jo  Jo^ 

w{x,x')\\pu{x)  -  Pu{x')\\a,cpdx'dx, 

(3) 

for  a  given 

norm-like  function 

•  \\a,(p  between  patches,  and 

Hw{x)  =  - 

-  /  w{x,x')\ogw{x,x')dx\ 

Jo^ 

is  the  entropy  of  the  probability  re(x,  •). 

We  take  O,  the  extended  inpainting  domain^  as  the  set  of  centers  of  patches 
that  intersect  the  hole,  i.e.O  =  O  f2p  =  {x  E  f2  :  (x  f2p)  HO  7^  0}.  Thus, 
patches  Pu{x')  centered  in  x'  G  are  entirely  outside  O  (Figure  1),  simplifying 
the  Euler-Lagrange  equation  for  the  minimizer.  Accordingly,  we  consider  that 
the  weight  function  w  is  defined  over  0x0^  and  w{x,x')dx'  =  1. 

For  a  simplified  presentation,  we  assume  that  O  +  C  j?,  ie. every  pixel 
in  O  supports  a  patch  centered  on  it  and  contained  in  i?.  This  is  not  true  if 
the  inpainting  domain  reaches  the  boundary  of  the  image,  and  details  on  the 
treatment  of  this  situation  are  given  in  Section  5.  Analogously,  we  also  shrink 
to  have  C  Q. 

Let  us  now  make  some  additional  comments  on  the  functional.  The  term 
{u{x)  —  u{x'))‘^  in  penalizing  differences  between  pixels,  is  substituted  by 
\\Pu{x)  —  Pu{x')\\a,(f‘  This  has  to  be  understood  together  with  the  inclusion  of 
the  second  term,  which  integrates  the  entropy  of  each  probability  re(x,  •)  over  O. 
For  a  given  completion  i4,  and  for  each  x  G  O,  the  optimum  weights  minimize 
the  mean  patch  error  for  Pu{x)^  given  by  w(x,  x')\\pu(x)  —  Pu{x')\\a,<pdx' , 
while  maximizing  the  entropy.  The  resulting  weights  are  Gaussian,  as  can  be 
confirmed  easily  by  derivating  the  energy.  This  can  be  related  to  the  prineiple 
of  maximum  entropy  [22],  widely  used  for  inference  of  probability  distributions. 
The  parameter  h  controls  the  trade-off  between  both  terms  and  is  also  the  scale 
parameter  of  the  Gaussian  weights.  Since  re(x,  •)  is  a  probability,  we  discard 
trivial  minima  of  E  with  w(x,x')  =  0  everywhere. 
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Patches  are  functions  defined  on  and 


The  patch  norm-like  function 

are  compared  using  ||  •  We  consider  a  non-decreasing  and  continuously 


differentiable  function  (p  :  IR+  ^  IR+  with  (/?(0)  =  0  and  define 
\\p\\a,<p=  [  ga{y)p{\p{y)\)<iy, 


by 


where  ga  is  an  intra-patch  weight  function,  a  Gaussian  centered  at  the  origin 
with  standard  deviation  a.  The  Li  and  the  squared  L2  norms  are  particular 
cases  of  ||  •  \\a,<p  when  p{t)  =  t  and  p{t)  =  respectively.  In  Section  3  we 
consider  another  norm  involving  derivatives  of  the  patch.  As  will  be  described 
below,  the  patch  norm  determines  not  only  the  similarity  criterion  but  also  the 
image  synthesis,  and  thus  is  a  key  element  in  the  framework. 

2.1  Probabilistic-geometric  model  interpretation 

The  proposed  model  can  be  written  in  terms  of  the  generalized  Kullback-Leibler 
divergence  [15].  Given  two  positive  and  integrable  functions  p,  g  defined  over  a 
certain  measure  space  T,  the  generalized  Kullback-Leibler  divergence  is  given  by: 

KL(p,  q)  =  fp^p{s)  log  ds— /^p(5)d5-h/;^g(5)d5,  assuming  that  the  integrals 

exist.  With  this  notation  (and  taking  into  account  that  u;(x,  •)  is  a  probability) 
the  functional  E  can  be  written  as 

E{u,w)  =  /  KL  (u;(x,  •),  r(x,  •))  dx  —  r{x,x')dx'dx, 

Jo  Jo  Jo^ 

where  r  is  the  Gaussian  weight  function  r(x,  x')  =  exp  {—^\\Pu{x)  —  Pu{x')\\a,(p) • 
The  first  term  integrates  the  divergence  between  the  functions  •)  and  r(x,  •), 
for  each  x  e  O.  The  second  term  can  be  interpreted  by  noticing  that 


q{x)  =  r{x^x')dx' 

Jo^ 


(4) 


is  a  density  estimate  (in  the  patch  space)  of  the  set  of  patches  in  O^:  The  higher 
the  amount  of  patches  in  close  to  Pu{x)  (according  to  the  scale  parameter  h), 
the  higher  the  value  of  q. 

The  minimizers  (i4*,u;*)  are  obtained  when  re*  (x,  x')  =  r*(x,  x')/g*(x),  (Gaus¬ 
sian  weights  normalized  by  (4)),  and  the  patches  of  the  inpainted  image  are  in 
regions  of  high  density  in  the  patch  space.  This  provides  a  geometric  intuitive 
interpretation  of  our  variational  formulation.  The  image  is  considered  as  an  en¬ 
semble  of  overlapping  patches.  Known  patches  in  are  fixed,  forming  a  patch 
density  model  used  to  estimate  the  patches  in  O.  The  richness  of  the  frame¬ 
work  is  given  in  part  by  the  fact  that  different  norms  in  the  patch  space  induce 
inpainting  schemes  of  different  nature,  as  we  are  going  to  see  next. 


2.2  Minimization  of  E 

We  have  formulated  the  inpainting  problem  as  the  constrained  optimization 

{u*^w*)  =  ai g min E{u^w)  subject  to  w{x^x')dx'  =  1  Vx  G  O.  (5) 

u,w  J  O^ 
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To  minimize  the  energy  we  use  an  alternate  coordinate  descent  algorithm. 
At  each  iteration,  two  optimization  steps  are  solved:  The  constrained  minimiza¬ 
tion  of  E  with  respect  to  w  while  keeping  u  fixed;  and  the  minimization  of  E  with 
respect  to  u  with  w  fixed.  This  procedure  yields  the  following  iterative  scheme 

1.  [Initial  Condition]  Given  uo{x)  with  x  G  O. 

2.  [Weights  Update]  Wk  =  argmin^  re),  subject  to  J^w{x,x')dx'  =  1. 

3.  [Image  Update]  Uk-\-i  =  SiTgmm^  E{u,Wk). 

4.  [Stopping  Criterion]  If  ||r^/c+i  —  Uk\\  >  r,  go  back  to  step  2. 


In  the  weights  update  step,  the  minimization  of  ^  w.r.t.  re  yields: 


Wk{x,x') 


-^WPuAx) -Pu{x') 


The  normalizing  factor  q{x)  is  the  density  estimate  given  by  (4),  for  patch  {x). 

The  parameter  h  determines  the  selectivity  of  the  similarity.  If  h  is  large, 
maximizing  the  entropy  becomes  more  relevant,  yielding  weights  which  are  less 
selective.  In  the  limit,  when  h  oo,  w{x^-)  becomes  a  uniform  distribution 
over  O^.  On  the  other  hand,  a  small  h  yields  weights  which  concentrate  on  the 
patches  close  to  Pu{x).  In  fact,  as  we  will  mention  later  on,  in  the  limit  as  h  ^  0, 
w{x.')  can  be  considered  as  an  approximation  to  an  (eventually  multivalued) 
correspondence. 

The  image  update  step  deserves  more  attention  and  is  described  next. 


Image  update  step.  We  now  detail  the  derivation  of  the  image  update  step  for 
the  cases  ip{t)  =  C  and  ip{t)  =  t.  We  refer  to  the  resulting  algorithms  as  pateh- 
wise  non-loeal  means  (patch  NL-means),  and  medians  (patch  NL-medians),  re¬ 
spectively. 

Pateh-wise  non-loeal  means.  If  (p{t)  =  C  the  image  energy  term  is  quadratic  on 
u,  and  its  minimum  for  fixed  weights  w  can  be  computed  explicitly  leading  to  a 
non-local  average: 

=  tAt  /  9o.{y)  [  w{x -y,z')u{z' +  y)dz'dy,  (6) 

C{x)  Jo- 

for  each  x  G  O,  where  the  normalization  constant  C{x)  is  given  by  C{x)  = 
9a{y)^y  =  the  area  of  the  patch  (measured  according  to  pa). 

Figure  1  explains  this  equation.  The  value  at  x  considers  all  patches  con¬ 
taining  X.  For  instance  the  patch  Pu{x  —  y)  covers  x,  Pu{x  —  y^y)  =  u{x).  This 
patch  is  compared  to  all  patches  in  the  complement,  Pu{z')^  yielding  the  weights 
w{x  —  p,  z').  Each  of  these  patches  contributes  the  term  w{x  —  y,  z')u{z'  +  y)  to 
the  average,  be. its  value  at  position  y  weighted  by  w{x  —  y^z'). 

Pateh-wise  non-loeal  medians.  We  now  consider  the  Li  norm  in  the  energy  E, 
corresponding  to  (/?(t)  =  t.  The  Euler  equation  for  given  the  weights  re,  is 

SuE{u,w){z)  =  /  gaiy)  I  w{z  -  y,x')sign[u{z)  -  u{x' +  y)]dx'dy  =  0.  (7) 

jQp  JO^ 


7 


z'  +  y 


Fig.  1.  Patch-wise  non-local  means  inpainting.  The  value  at  x  G  O  is  computed  using 
all  the  patches  that  overlap  x.  The  patch  centered  oX  x  —  y  contributes  with  the  term 
w{x  —  y,  z)u{z'  +  y)  to  the  average  for  each  z'  ^  O^. 


The  solution  of  this  equation  is  given  by  a  weighted  median  of  the  values  outside 
the  hole.  We  can  see  this  easily  by  defining  z'  =  x'  y  and  rewriting  Eq.  (7)  as 


where 


sign[u(2)  -  u{z')\pz{z’)Az’ , 


-  y)9a{y)w{z  -y,z'  -  y)dy. 


(8) 


For  a  given  ^  G  O,  the  function  pz  :  ^  weights  the  contribution  of  each 

location  z'  to  the  median.  The  quantity  pz{z')  is  computed  by  integrating  the 
similarity  w{z  —  y^z'  —  y)  between  all  patches  that  overlap  z'  and  those  that 
overlap  2;  in  the  same  relative  position.  It  tells  us  how  much  evidence  there  is 
supporting  u{z')  as  the  intensity  value  for  2:.  The  function  Xqc  takes  the  value 
1  on  and  0  on  O. 


2.3  Revisiting  related  work 

We  conclude  this  section  by  further  connecting  our  work  with  previous  art.  The 
method  in  [38]  is  closely  related  to  the  patch  NL-means  scheme  of  Eq.  (6).  The 
key  difference  lies  in  the  underlying  theoretical  model.  The  problem  is  addressed 
as  a  MRF,  where  pixels  outside  the  hole  are  observable  variables,  missing  pixels 
in  the  hole  are  the  parameters,  and  the  hidden  variables  are  given  by  the  cor¬ 
respondence  r  :  O  which  assigns  a  patch  outside  the  hole  to  each  x  in 

O.  The  method  can  be  seen  as  an  approximate  EM  algorithm  for  maximizing 
the  log-likelihood  w.r.t.  the  pixels  in  O,  and  some  approximations  have  to  be 
taken  to  make  the  optimization  tractable.  Based  on  heuristics,  the  authors  also 
propose  to  use  more  robust  estimators  than  the  mean  for  the  synthesis  of  pixels. 
With  the  framework  here  proposed,  robust  estimators  (as  the  median)  naturally 
result  from  particular  choices  of  the  patch  norm  ||  •  \\a,(f 

The  patch  NL-means  algorithm  is  also  related  to  the  interesting  manifold  im¬ 
age  models  of  [32].  Eq.  (6)  can  be  split  into  two  steps  which  are  analog  to  Peyre’s 
manifold  and  image  projection  steps.  First,  for  each  patch  centered  in  O  we  com¬ 
pute  a  new  patch  as  a  weighted  average  of  all  patches  in  the  complement,  accord¬ 
ing  to  the  patch  similarity  weights  p^^{z)  :=  w{z^  z')pu{z')dz'  with  2;  G  O. 
Doing  this  for  each  hole  position  yields  an  incoherent  ensemble  of  patches.  The 
image  is  obtained  by  averaging  these  patches:  u{z)  =  ~  Vi 


We  use  a  density  model,  instead  of  the  manifold  model  of  [32].  Indeed,  p^^{x) 
is  the  mean  shift  operator  applied  to  Pu{x).  It  is  known  that  the  iteration  of  this 
operator  corresponds  to  an  adaptive  gradient  ascent  of  the  Parzen  estimate  of  a 
PDF  [13],  which  in  this  case  is  generated  by  the  set  of  patches  in  the  complement 
of  the  hole.  The  use  of  a  density  model  entails  some  advantages,  mainly  from 
the  computational  point  of  view,  learning  a  manifold  model  is  computationally 
costly.  Furthermore,  the  assumption  that  patches  he  on  a  manifold  is  question¬ 
able  (one  could  think  for  instance  in  a  stratification  as  a  more  realistic  model), 
and  its  dimension  is  hard  to  determine  for  real  images. 


3  Higher  order  variational  models 


The  proposed  variational  framework  allows  the  introduction  of  derivatives  of  the 
image,  by  considering  them  in  the  patch  norm  used  in  (3).  In  this  section  we 
study  a  functional  using  the  L2  norm  of  the  gradients  of  the  patches, 

\\piy}\\l,v=  [  9a{y)\\'^p{y}\\ldy, 

where  ||  •  II2  is  the  Euclidean  norm  in  Firstly,  the  similarity  weights  are 
now  based  on  patch  gradients,  and  secondly,  the  image  update  step  is  given  by 
a  non-local  Poisson  equation,  be.a  Poisson  equation  with  non-local  coefficients. 
The  functional  is  obtained  by  substituting  in  (2)  the  image  energy  term  Fw{u)  = 
fd  fd^  x')\\pu{x)  —  Pu(^0lla  assume  that  u\oc  =  u). 

The  Euler  equation  w.r.t.  u  of  the  resulting  functional  is 


V-  [C(z)V^(z)]  =  V-t;(z), 

for  all  z  G  O,  where  u\oc  =  u  and  the  field  t;  :  O  ^  is  given  by 


(9) 


v{z) 


y,  x')Vu{x'  +  y)dx'dy. 


The  solutions  are  minimizers  of  J^C{z)\\\/u{z)  —  v{z)\\2dz  (as  before,  C{z)  = 
A{f2p)).  Therefore,  u  is  computed  as  the  image  with  the  closest  gradient  (in  the 
1/2  sense)  to  the  guiding  vector  field  t;,  which  corresponds  to  a  non-local  weighted 
average  of  the  gradients  in  the  complement.  The  coefficients  in  the  average  have 
exactly  the  same  form  as  in  (6).  The  only  difference  is  that  the  patch  similarity 
weights  used  here  are  Gaussian  weights  of  the  L2  norm  of  the  gradients.  See  [30] 
for  further  uses  of  the  Poisson  equation  in  image  editing. 

This  energy  can  be  combined  with  the  patch  NL-means  energy  by  considering 
a  linear  combination  of  the  corresponding  image  energy  terms.  The  resulting 
scheme  computes  the  weights  based  on  the  image  together  with  its  gradient,  and 
updates  the  image  by  solving  a  linear  combination  of  Eqs.  (6)  and  (9). 


4  Confidence  mask 

Eor  large  inpainting  domains,  it  is  useful  to  introduce  a  mask  /^  :  i?  ^  (0, 1] 
which  assigns  a  confidence  value  to  each  pixel,  depending  on  the  certainty  of  its 
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information  (see  also  [14,24]).  This  will  help  in  guiding  the  flow  of  information 
from  the  boundary  towards  the  interior  of  the  hole,  eliminating  some  local  min¬ 
ima  and  reducing  the  effect  of  the  initial  condition.  The  resulting  image  energy 
term  takes  the  form 


Fyj{u) 


k{x)w{x,x')\\pu{x)  -  Pu{x')\\a^ipdx'Ax 


where  n  modulates  the  penalization  of  the  incoherences  between  w  and  the  (f- 
norm  between  patches. 

The  effect  of  n  on  the  image  update  step  is  easier  to  visualize  on  the  evi¬ 
dence  function  Eq.  (8).  Recall  that  this  function  gathers  all  evidence  sup¬ 
porting  u{z')  as  a  value  for  u{z)^  for  each  z'  G  O^.  As  in  Eq.  (8),  now  tak- 
ing  K  into  account,  we  obtain  p^,z{z')  =  Jq^Xo^O  ~  y)9a{y)K{z  -  y)w{z  - 
y^z'  —  y)dy.  Thus,  the  contribution  of  the  patch  Pu{z  —  v)  to  the  evidence 
function  is  now  weighted  by  its  confidence.  Patches  with  higher  confidence  will 
support  stronger  evidence.  In  this  case  the  weights  are  given  by  w{x^x')  = 

^exp  (^-^\\pu{x)  -  P{,{x')\\a,jj. 

The  inclusion  of  the  confidence  mask  modifies  the  patch  space  scale  parameter 
h.  If  the  confidence  is  high,  the  effective  scale  h/ n{x)  will  be  lower,  thus  increasing 
the  selectivity  of  the  similarity  measure.  If  the  information  at  x  is  uncertain, 
more  patches  are  considered  similar.  The  same  reasoning  applies  to  the  patch 
NL-Poisson  energy,  with  similar  modifications  to  Eq.  (9). 


5  Experimental  results 

We  tested  the  proposed  methods  with  gray  scale  and  color  images.  The  energy 
for  the  latter  can  be  obtained  by  considering  a  norm  for  color  patches  that 
adds  the  norms  of  the  three  scalar  components:  \\p^{x)\\a^ip  =  \\Pui{x)\\a,cp^ 

where  4?  :  i?  ^  is  the  color  image,  and  Ui^  with  i  =  1,2,3,  its  components 
(analogously  for  ||  •  ||a,v)-  Thus,  the  weights  will  take  into  account  the  three 
channels.  Given  the  weights,  each  channel  is  updated  using  the  corresponding 
scheme  for  scalar  images.  All  channels  are  updated  using  the  same  weights.  This 
scheme  can  be  applied  to  any  Euclidean  color  space.  We  show  results  with  RGB 
and  GIE  La*b*  color  spaces. 

In  our  implementation  we  use  a  square  patch  domain  Qp  of  side  5  G  N, 
with  the  Gaussian  intra-patch  weights  ga  centered  on  it.  Eor  all  experiments  we 
set  s  =  3a  {s  should  be  chosen  such  that  most  of  the  effective  support  of  the 
Gaussian  fits  in  the  patch,  we  used  a  smaller  5  to  lower  the  computational  cost). 
This  leaves  only  two  independent  parameters,  namely,  the  intra-patch  Gaussian 
width  a,  and  the  patch  similarity  scale  h.  The  former  determines  the  size  of  the 
patch,  a  parameter  inherent  to  all  patch-based  techniques.  It  should  be  large 
enough  so  as  to  allow  the  identification  of  the  image  patterns. 

In  the  limit  when  h  ^  0,  we  compute  the  weights  as  limh^oWh{x,x')  = 
-^^^6{x' —  n{x)),  where  n{x)  C  is  the  set  of  nearest  neighbors  of  x,  i.e.n{x)  = 
{x'  G  :  \\pu{x)  —  Pu{x')\\a,<p  =  Sn},  where  6n  represents  the  nearest  neighbor 
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distance.  In  practice,  we  assume  that  #n(x)  =  1,  ie.the  nearest  neighbor  is 
unique.  The  choice  of  this  parameter  will  be  addressed  later. 

The  confidence  mask,  when  used,  adds  another  parameter.  We  found  good 
results  using  the  following  function: 

=  J  (1  -  f^o)e  Ko  if  X  e  O, 

\  1  if  X  G  O^, 

which  shows  an  exponential  decay  w.r.t.  the  distance  to  the  boundary  inside 
the  hole  d(-,90),  where  >  0  is  the  decay  time  and  kq  >  0  determines  the 
asymptotic  value  reached  far  away  from  the  boundary. 

If  a  patch  centered  in  the  inpainting  domain  does  not  fit  in  the  image,  we 
mirror  the  image  w.r.t.  the  boundary  to  complete  the  patch.  Whenever  the  hole 
reaches  df2,  the  Poisson  equation  requires  a  different  boundary  condition.  We 
have  considered  Neuman  boundary  conditions,  i.e.Vu'n(x)  =  0,  for  x  e  Ondf2, 
where  n(x)  is  the  normal  direction  at  the  boundary.  This  amounts  again  to  a 
reflection  of  u  w.r.t.  df2. 

The  computational  cost  of  each  iteration  is  0{A{0)  x  A{0^)  x  s^).  This  is 
typical  of  non-local  methods,  and  several  strategies  can  be  used  for  speed-up  [8, 
27]. 

Figure  2  compares  the  results  of  the  three  methods  on  a  texture  with  two 
different  mean  intensities,  darker  on  the  right  half  of  the  image.  The  inpainting 
domain  hides  all  patches  on  the  boundary  between  the  dark  and  bright  textures. 
With  this  we  can  test  the  ability  of  each  method  to  create  an  interface  between 
both  regions.  Situations  like  these  are  common  in  real  inpainting  problems  due 
for  instance  to  shadows.  Moreover,  when  inpainting  non-regular  textures,  a  good 
completion  may  not  be  possible  just  by  copying,  and  creating  new  patterns 
becomes  necessary  (see  Figure  3). 

We  have  also  added  Gaussian  noise  with  standard  deviation  a  =  10  to  show 
the  influence  of  the  patch  space  scale  parameter  h.  Figure  2  shows  two  results  for 
each  scheme,  one  with  h  =  0,  and  the  other  with  a  higher  h,  chosen  empirically 
for  each  method. 


Fig.  2.  Results  with  s  =  15  and  a  =  5.  The  first  four  columns  correspond  to  the  initial 
condition,  result  of  path  NL-medians,  -means  and  -Poisson.  Top  row,  /i  =  0,  bottom 
row  h  =  0.01,  h  =  0.05  and  h  =  0.04,  respectively.  The  used  intra-patch  weight  kernel 
Qa  is  shown  in  each  figure  on  the  bottom  right.  The  fifth  column  shows  the  value  of 
the  images  for  a  horizontal  line  going  between  the  circles. 


The  rightmost  column  in  Figure  2  plots  the  image  values  for  a  horizontal  line 
between  the  circles.  The  interpolation  done  by  the  patch  NL-Poisson  method  is 
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linear,  since  this  is  a  solution  of  the  homogeneous  Poisson  equation.  The  profile 
shown  by  patch  NL-means  shows  a  smooth  transition  when  both  regions  meet, 
whereas  the  use  of  the  Li  norm  yields  a  sharp  edge.  The  results  using  a  higher 
h  show  some  denoising,  since  for  larger  h,  more  patches  are  regarded  as  similar 
to  each  patch  in  O  and  each  pixel  value  is  synthesized  from  more  complement 
pixels.  For  inpainting  of  noiseless  images,  we  use  h  =  0. 

The  top  row  of  Figure  3  shows  results  with  the  three  schemes  for  a  non-regular 
texture.  The  result  with  patch  NL-medians  shows  image  regions  copied  without 
any  modification.  The  boundaries  between  these  regions  are  determined  so  that 
each  patch  on  the  boundary  is  close  to  some  patch  outside  the  hole.  This  does  not 
always  yields  a  smooth  transition.  Copied  patterns  can  also  be  seen  in  the  result 
of  the  patch  NL-means,  but  the  copies  are  less  sharp,  and  the  discontinuities  less 
noticeable.  The  patch  NL-Poisson  shows  a  better  continuation  of  the  color  at  the 
boundary  of  the  hole.  However  the  inpainted  structure  looks  too  blurry  (zoom 
on  the  pdf  file  for  details). 


1^ 

B 

Kxjcxn  A 

K  2 

k:  X. 

cxxxxx: 

cxxxxx: 

<  u  u  4.11  W  W  ^ 

KXXX A A 

KXXXXX 

KXXXXX: 

cxxxxx: 

CXXXXM 

<  U  U  4.11  W 

RXXA A A 

kxxxxx 

Kxxxxx: 

cxxxxx: 

CXXXXM 

<  u  u  4i^  w  mm 

irXAAA  A 

^xxxxx 

kxxxxx; 

cxxxxx: 

CXXXXI>*^ 

<  u  u  4i^  W  kid 

Fig.  3.  From  left  to  right:  Initial  condition,  result  of  path  NL-medians,  -means  and 
-Poisson.  Top:  Results  with  s  =  15,  a  =  5  and  /i  =  0,  using  the  CIE  La*b*  color  space. 
Bottom:  Results  with  s  =  25,  a  =  8  and  /i  =  0.  Gray  scale  image. 


The  bottom  row  of  Figure  3  depicts  results  on  a  regular  texture.  The  reg¬ 
ularity  of  the  texture  hides  the  blurring  effects  of  the  L2  metrics  (both  on  the 
image  and  gradients).  At  a  stable  state,  all  patches  overlapping  on  a  pixel  will 
agree  on  its  value.  Notice  that  in  this  case,  the  patch  NL-Poisson  is  able  to  re¬ 
construct  the  illumination  gradient  of  the  image.  This  is  imposed  to  the  solution 
of  the  Poisson  equation  by  the  boundary  conditions.  In  addition  to  the  non-local 
inpainting,  this  scheme  performs  also  a  local  interpolation  based  on  the  hole’s 
boundary.  Instead,  the  other  methods  copied  the  information  from  the  bottom 
of  the  image,  generating  a  discontinuity  at  the  top. 

The  results  shown  in  Figure  4,  were  computed  using  a  confidence  mask  shown 
at  the  leftmost  column.  In  both  cases  the  patch  NL-medians  scheme  yields  the 
best  results,  comparable  to  state  of  the  art  (see  results  in  [17,24]  for  results  on 
the  same  images).  The  images  look  as  a  composition  of  copied  regions  (although 
some  parts  in  the  elephant  image  seem  new).  Again  the  patch  NL-means  shows 
blurred  results,  most  noticeable  for  elephant  due  to  the  non-regularity  of  the 
textures.  The  patch  NL-Poisson  fails  with  this  image.  In  this  case  the  gradient 
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is  not  a  good  feature  for  computing  the  weights.  However,  the  results  for  Figure 
4  are  still  reasonable,  it  did  recover  the  structure  of  the  image.  Due  to  the 
averaging  of  gradients,  when  overlapping  patches  do  not  agree  on  the  value  of  a 
pixel,  lower  gradients  may  appear.  These  generate  phantom  edges  surrounding 
the  cylinders.  Presumably  a  more  robust  estimation  of  the  gradient  would  not 
have  this  problem.  We  are  currently  developing  a  scheme  using  the  Li  norm 
between  the  gradients  of  patches. 

The  initial  condition  for  elephant  is  the  original  image,  whereas  for  eylinders 
the  hole  was  filled  with  128  as  constant  gray  level.  A  confidence  mask  k  with  low 
confidence  inside  the  hole  helps  in  diminishing  the  infiuence  of  the  initialization. 
Further  results  are  available  at:  http://gpi.upf.edu/static/vnli. 


Fig.  4.  From  left  to  right,  inpainting  domain  with  confidence  mask,  result  of  path  NL- 
medians,  -means  and  -Poisson  (the  latter  only  for  the  first  row).  Top:  cylinders-  Results 
with  s  =  27,  a  =  9  and  /i  =  0  for  patch  NL-medians  and  -means  and  s  =  33,  a  =  1 
and  h  =  0  patch  NL-Poisson.  Bottom:  elepahnt-  Results  with  s  =  19,  a  =  6  and  h  =  0. 
The  parameters  of  the  confidence  mask  are  tq  =  5  and  no  =  0.4  in  all  cases  except  for 
the  patch  NL-medians  with  the  bottom  image,  in  which  we  set  kq  =  0.1.  Results  using 
RGB.  Please  refer  to  [17,  24]  for  other  results  on  the  same  images. 


6  Conclusions  and  future  work 

In  this  work  we  present  a  variational  framework  for  non-local  image  inpaint¬ 
ing.  The  proposed  energy  lends  itself  to  intuitive  interpretations,  and  contrary 
to  previous  variational  models,  allows  a  straightforward  minimization  using  a 
coordinate  descent  scheme.  Beyond  the  specific  application  of  inpainting,  this 
framework  provides  also  a  sound  variational  modelling  of  non-local  regulariz- 
ers  with  adaptive  weights,  extending  previous  work  in  which  the  weights  are 
considered  known  and  fixed. 

Starting  from  this  model,  we  derived  three  different  inpainting  schemes,  each 
one  corresponding  to  a  different  norm  measuring  the  distance  between  patches. 
We  showed  results  on  synthetic  and  natural  images  comparing  their  properties. 

The  derived  pateh  NL-means  provides  a  variational  interpretation  of  the 
methods  proposed  by  [32,38].  The  pateh  NL-medians  is  the  one  that  showed 
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the  best  overall  performance,  comparable  with  the  state  of  the  art.  The  results 
obtained  suggest  a  possible  relation  with  the  piece- wise  traslation  models  of  [1, 
16].  The  patch  NL-Poisson  presents  two  interesting  features.  First,  the  similar¬ 
ity  weights  are  computed  based  on  the  gradients,  allowing  the  transference  of 
information  from  areas  with  different  intensity  level.  Second,  the  image  com¬ 
pletion  is  the  result  of  a  Poisson  equation,  thus  incorporating  some  basic  local 
regularization,  meaning  the  completion  must  be  differentiable  and  its  gradient 
squared  integrable.  This  traduces  in  a  local  interpolation  based  on  the  image 
values  at  the  boundary  of  the  inpainting  domain.  This  method  performs  well  for 
structured  textures,  but  fails  for  non-regular  textures. 

We  are  currently  exploring  several  additional  aspects  of  this  framework,  in¬ 
cluding  the  use  of  robust  norms  in  the  general  (p  setting  and  the  Li  norm  between 
patch  gradients. 
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